Cox Proportional Hazards Model

Hermela Shimelis and Kayla Boyd

November 18, 2024

Outline

  • Introduction of Cox regression

  • Two examples

    • Survival after chemotherapy of colon cancer patients
    • Kayla’s data

Cox proportional hazards model

  • A popular regression modeling method used to explore the relationship between survival time and covariates.


  • It assumes that the effects of different variables on the outcome are constant over time.


  • Survival can refer to the development of a symptom, time to relapse after remission, or as a time to death [1].


Cox proportional hazards model

  • Cox regression model is based on the hazard function \(h_x(t)\) with covariates at time t given by [2]:

    • \(h_x(t)=h_0(t)\exp(\beta_1x_1 +\beta_2x_2 + \dots + \beta_p x_p)\)

    • Where:

      • \(h_x(t)\) is the hazard function

      • \(h_0(t)\) is the baseline hazard function

      • \(\beta_1x_1 + \beta_2x_2 + \dots +\beta_p x_p\) represent the linear combination of covariates and their coefficient

Proportional Hazards assumption

  • The assumption of a constant relationship between dependent and explanatory variables is called proportional hazards [3].

Hazard Ratios

  • The hazard function is the probability that an individual will experience an event (death) within a certain time interval [1].

  • The hazard ratio is used to compare the hazard rate between two groups

    • HR = \(hx_2(t)\) / \(hx_1(t)\) = \(\exp[\beta(x_2-x_1)]\)

      • where \(hx_2(t)\) and \(hx_1(t)\) are the hazard function for the two group

    • HR = 1: No difference in hazard rates between the two groups

    • HR >1: Higher hazard rate in the second group compared to the first

    • HR <1: Lower hazard rate in the second group compared to the first

Time-Varying Coefficients

  • Failing to meet the assumption of proportional hazards means that the effects between dependent and explanatory variables are not constant over time.

  • Time-varying covariates (coefficients) are used when a covariate changes over time during the follow-up period [4].

  • Internal time-varying coefficients are affected by survival status and include values that are generated by the subject [4].

  • External time-varying coefficients are pre-determined and external to the subject under study [4].

R packages used for survival analysis

Package Description
Survival - Used for fitting and analyzing survival models
- Fits Kaplan-Meier survival curves
Survminer - Plots Kaplan-Meier survival curves using ggplot2
- Plots Schoenfeld residuals

Cox regression modeling of survival after chemotherapy for colon cancer

  • Data: Survival after chemotherapy for Stage B/C colon cancer [6]

  • Goal: Model the relationship between survival time and treatment groups

  • Predictors

Category Variables
Treatments - Observation (no treatment)
- Amisole (Lev)
- Amisole + 5-FU
Patient Characteristics - Age
- Sex
Tumor Characteristics - Colon perforation and obstruction
- Adherence to nearby organs
- Tumor differentiation
- Local spread

Kaplan-Meier curve stratified by treatment groups

Cox regression models

  1. Base model: No predictors

  2. Univariate: Treatment

  3. Full variables: All predictors

  4. Significant predictors: stepwise-selected variables

  5. Final model: Stratified

Model 2: Univariate model

Characteristic HR1 95% CI1 p-value
rx


    Obs
    Lev 0.97 0.78, 1.21 0.8
    Lev+5FU 0.69 0.55, 0.87 0.002
Concordance = 0.536
1 HR = Hazard Ratio, CI = Confidence Interval

Model 3: All predictors

Characteristic HR1 95% CI1 p-value
rx


    Obs
    Lev 0.98 0.79, 1.22 0.9
    Lev+5FU 0.69 0.54, 0.87 0.002
age 1.01 1.00, 1.02 0.083
sex 1.04 0.86, 1.26 0.7
perfor 1.00 0.59, 1.70 >0.9
adhere 1.18 0.92, 1.53 0.2
surg 1.27 1.03, 1.55 0.022
obstruct 1.33 1.06, 1.68 0.015
differentiation


    moderate
    poor 1.43 1.13, 1.82 0.003
    well 1.08 0.78, 1.50 0.6
node4 2.55 2.10, 3.09 <0.001
local_spread


    contiguous
    muscle 0.39 0.23, 0.64 <0.001
    serosa 0.64 0.43, 0.94 0.023
    submucosa 0.29 0.10, 0.83 0.021
Concordance = 0.674
1 HR = Hazard Ratio, CI = Confidence Interval

Model 4: Stepwise-selected variables

Characteristic HR1 95% CI1 p-value
rx


    Obs
    Lev 0.99 0.80, 1.23 >0.9
    Lev+5FU 0.69 0.54, 0.87 0.002
age 1.01 1.00, 1.02 0.069
surg 1.28 1.04, 1.56 0.018
obstruct 1.33 1.06, 1.67 0.015
differentiation


    moderate
    poor 1.45 1.15, 1.84 0.002
    well 1.07 0.77, 1.48 0.7
node4 2.53 2.09, 3.07 <0.001
local_spread


    contiguous
    muscle 0.37 0.23, 0.61 <0.001
    serosa 0.61 0.41, 0.89 0.010
    submucosa 0.27 0.09, 0.76 0.014
Concordance = 0.672
1 HR = Hazard Ratio, CI = Confidence Interval

Proportional Hazard Assumption is not met

Schoenfeld Residuals Test Results
chisq df p Variable
rx 2.335 2 0.311 rx
age 0.549 1 0.459 age
surg 0.020 1 0.888 surg
obstruct 6.148 1 0.013 obstruct
differentiation 17.459 2 0.000 differentiation
node4 5.662 1 0.017 node4
local_spread 7.083 3 0.069 local_spread
GLOBAL 37.525 11 0.000 GLOBAL

Model 5: Stratified Model

Characteristic HR1 95% CI1 p-value
rx


    Obs
    Lev 0.98 0.79, 1.22 0.9
    Lev+5FU 0.71 0.56, 0.89 0.003
age 1.01 1.00, 1.02 0.034
surg 1.30 1.06, 1.59 0.012
node4 2.50 2.06, 3.04 <0.001
local_spread


    contiguous
    muscle 0.34 0.21, 0.56 <0.001
    serosa 0.58 0.39, 0.84 0.004
    submucosa 0.24 0.08, 0.67 0.007
Concordance = 0.674
1 HR = Hazard Ratio, CI = Confidence Interval

Stratified model meets proportional hazards assumption

Schoenfeld Residuals Test Results
chisq df p Variable
rx 2.001 2 0.368 rx
age 0.670 1 0.413 age
surg 0.014 1 0.905 surg
node4 4.288 1 0.038 node4
local_spread 5.298 3 0.151 local_spread
GLOBAL 12.411 8 0.134 GLOBAL

Model Evaluation Metrics

Model Description AIC BIC C_Index
Model 1 Base model 5860.383 5860.383 0.500
Model 2 Treatment 5852.236 5860.463 0.536
Model 3 Full variables 5741.401 5798.993 0.674
Model 4 Stepwise-selected variables 5737.261 5782.511 0.672
Model 5 Stratified 4567.829 4600.739 0.674

K-fold cross validation

Original c-index: 0.6544784 
Mean cross-validated c-Index: 0.6420104 

Conclusions

References

[1]
S. J. Walters, “Analyzing time to event outcomes with a cox regression model,” Wiley Interdiscip. Rev. Comput. Stat., vol. 4, no. 3, pp. 310–315, May 2012.
[2]
R. Tibshirani, “What is cox’s proportional hazards model?” Signif. (Oxf.), vol. 19, no. 2, pp. 38–39, Apr. 2022.
[3]
C. A. Bellera, G. MacGrogan, M. Debled, C. T. de Lara, V. Brouste, and S. Mathoulin-Pélissier, “Variables with time-varying effects and the cox model: Some statistical concepts illustrated with a prognostic factor study in breast cancer,” BMC Med. Res. Methodol., vol. 10, no. 1, p. 20, Mar. 2010.
[4]
Z. Zhang, J. Reinikainen, K. A. Adeleke, M. E. Pieterse, and C. G. M. Groothuis-Oudshoorn, “Time-varying covariates and coefficients in cox regression models,” Ann. Transl. Med., vol. 6, no. 7, pp. 121–121, Apr. 2018.
[5]
Terry M. Therneau and Patricia M. Grambsch, Modeling survival data: Extending the Cox model. New York: Springer, 2000.
[6]
T. M. Therneau, A package for survival analysis in r. 2024. Available: https://CRAN.R-project.org/package=survival